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Abstract 

We consider an ensemble of K single-layer perceptrons exposed to random inputs and investigate 
the conditions under which the couplings of these perceptrons can be chosen such that prescribed 
correlations between the outputs occur. A general formalism is introduced using a multi-perceptron 
costfunction that allows to determine the maximal number of random inputs as a function of the 
desired values of the correlations. Replica-symmetric results for K — 2 and K — 3 are compared with 
properties of two-layer networks of tree-structure and fixed Boolean function between hidden units 
and output. The results show which correlations in the hidden layer of multi-layer neural networks 
are crucial for the value of the storage capacity. 

1 Introduction 

One of the central tasks in the field of statistical mechanics of neural networks is a deeper understanding 
of the information processing abilities of multi-layer feed- forward networks (MLN). After a thorough 
analysis of the single-layer perceptron it soon became clear that the very properties that entail the larger 
computational power of MLN also make their theoretical description within the framework of statistical 
mechanics much harder. Even the simplest case with just one hidden layer containing much less units 
than the input layer and with a pre-wired Boolean function from the hidden layer to the output has 
proven to be rather complicated to analyze exactly || [|, ||, ^| . It is therefore important to develop useful 
and reliable approximate methods to study these practically important systems. For the characterization 
of the generalization ability bounds for the performance parameters have been shown to yield useful 
orientations [Q, ||. For the storage capacity, i.e. the typical maximal number of random input-output 
mappings that can be implemented by the network only rather crude bounds exist so far, and these are 
independent of the hidden-to-output mapping (||). 

Let us start the discussion with a number of general open questions regarding the capacity of MLN. 
These questions, although only partially answered in the present work, may serve as a call for further 
investigation by the community of the statistical mechanics of neural networks. 

Correlations among the hidden units: The increased computational power of MLN stems from the 
possibility that the different subperceptrons between input and hidden layer can all operate in the region 
beyond their storage capacity. The occuring errors typical of this regime can be compensated by other 
subperceptrons. However, this "division of labour" only works appropriately if the errors do not occur for 
all subperceptrons in the same patterns. Hence intricate correlations depending on the hidden to output 
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mapping develop in the hidden layer when the number of input-output pairs increases This quali- 
tative picture has already been used to propose and analyze a learning algorithm for a special MLN, the 
parity-machine p5| . It has been observed for some time that the organization of internal representations 
described by these correlations is crucial for the understanding of the storage and generalization abilities 
of MLN § [h], 0, |1| [B 



The approximation suggested in this work is to replace "division of labour" by "average division of 
labour". An approximate treatment of a MLN becomes possible if one does not require a definite mapping 
from the hidden layer to the output but instead prescribes the values for the correlations, i.e. the average 
relation between the hidden units and the output and also among the different hidden units themselves. 
The task is then to determine how many random inputs can be implemented by a set of K perceptrons, 
such that the outputs show definite correlations. 

Interplay between correlations and the capacity: This approach will highlight which type of correlations 
is easy to implement and which is difficult, i.e. reduce the storage capacity significantly. It is already 
known that increasing the average correlation between each one of the hidden units and the desired 
output decreases the capacity. This result can be examplified by the following well known limits. The 
lowest capacity is achieved for hidden units which are fully correlated with the desired outputs. In this 
case there is no division of labour and the MLN shrinks to a simple perceptron. The other limit is the 
parity machine, in which the correlation between each hidden unit and the output is zero. In this case 
the upper bound for the capacity of MLN with one hidden layer is achieved. Nevertheless, the general 
framework of how the capacity depends on the correlations between the output and a partial set of the 
hidden units is still unknown. The main problem is that with increasing K there is a trade-off between 
a more flexible division of labour and an increasing complexity of possible correlations. 

Possible scaling for the capacity: Of particular interest is the limit of an infinite number K of hidden 
units for which only few analytical results are known. For the AND machine the capacity is of 0(1) Jul, 
whereas for the committee-machine and the parity machine the capacity is of order (logK) s , with 5=1/2 
Jl3| and 1 p|, respectively. These results may suggest one of the following two possible scenarios: In 
the first scenario, the capacity varies continuously as a function of the hidden/output correlations. Any 
< S < 1 can be found, depending on the correlations. In the second possible scenario, <5 = 1 holds for 
the parity machine only, and all other hidden/output correlations result in a 6 with a finite distance from 
1. 

Space of possible correlations: The simmultaneous prescription of correlations involving several hidden 
units has to take into account that not all combinations of correlations are possible since they all derive 
from a common probability distribution. The question of whether there are forbidden combinations of 
correlations and what is their measure, will be partially answered in the following discussion. 

The paper is organized as follows. Section 2 sets the task and fixes the notations. In section 3 
the formalism is presented which is a generalization of the canonical phase space method developed 
by Gardner and Derrida [|l8) for the single-layer perceptron. Section 4 contains general results for an 
arbitrary number K of perceptrons with a special subset of fixed correlations. In sections 5 and 6 we 
study in detail the situations of K — 2 and K = 3 perceptrons respectively and compare the results with 
those known for tree-structured MLN with the same number of hidden units. Finally, section 7 comprises 
our conclusions. 



2 The storage problem for correlated perceptrons 

We consider K spherical perceptrons with N/K inputs, one output, and couplings 3k £ M N / K , JfeJfc = 
N/K with k = 1, . . . , K. Moreover we choose a set of {aN)K random inputs € M N ' K and one overall 
random output a v = ±1 with v = 1, . . . , aN . The total number of random input and output bits is 
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hence aN(N + 1) and the number of adjustable weights is N as for the standard perceptron and for 
multilayer networks with tree-structure and fixed Boolean function between hidden units and output. 
The outputs of the K perceptrons are given by 



T V k =B m [ X -Ml\ (1) 




Our aim is to determine the critical number a c N of patterns for which coupling vectors Jfc exist such 
that the averages 



ci = (Tfccr) 



C2 = [TkTl<T) 



C3 = {TkTlT m (T) = 



~aN 

^ \ ^ „v „v „v v 



c K = {n---T K a) = —Y j t» 1 ---t» k o» (3) 
have prescribed values cj., eg, • • • ,ck- This can be seen as a generalization of the program of Gardner and 



Derrida 18 who considered only one perceptron, i.e. K = 1, and determined a c in dependence on the 
fraction of errors /gd related to c\ by c\ — 1 — 2fcD- The new aspect of the present investigation is that 
not only the correlation of each individual output Tk with a but also the correlation between different Tk 
is taken into account. 

As usual we assume that the components of the input patterns £^ as well as the overall outputs a v 
are independent random variables with zero mean and unit variance. The transformation £^ ~~ * 
then preserves the statistical properties of the inputs. In the following we therefore take a v = 1 for all 
v = 1 , . . . , aN without loss of generality. 

Note that due to the independence of the inputs at different perceptrons all outputs have identical 
statistical properties. Therefore the correlations c m as defined in (Q) do not depend on the particular 
subset of hidden units for which they are calculated. This corresponds to the permutation symmetry 
between hidden units in MLN with appropriate decoder functions ([|[ |[ |[). 

It is in particular interesting to enforce correlations c m that are identical to those which develop spon- 
taneously in MLN with special Boolean functions between hidden layer and output. It has recently been 
shown how these correlations can be calculated from the joint probability distribution of the stabilities 
at the hidden units Jl4[ . For the parity machine with K hidden units one finds c m = for m < K 
and ck = 1. For the committee machine the expressions are more complicated, for K = 3 one finds 
c 1 = 5/12,c 2 = -l/6,c 3 = -3/4. 

3 Formalism 

To analyze the storage abilities of correlated perceptrons we use a generalization of the formalism intro- 
duced by Gardner and Derrida Jl8| . A well suited form for our purposes is the one proposed by Griniasty 
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and Gutfreund |L9| . We are hence led to introduce a multiperceptron cost function pTJ 

E(J U ---,3 K ) = ^Vtf ,...,-&) (4) 



E 



k (k,l) (k,l,m) 



(5) 



The parameters fj, m play the role of chemical potentials determining the costs for a violation of the 
constraints on the correlations c m . Our aim is to characterize the coupling vectors that minimize 
E(3±, . . . ,3k) and to find the critical threshold a c for the number of inputs for which no couplings 
(Ji, . . . ,3ft) exist that realize the desired correlations. This can be done by calculating the free energy 

1 f K 

/(a,/3,M2,-- - = - j Hm ) — {(log J J[dfi(3 k ) exp(~/3E(3 1 , . . . ,3 K ))) (6) 

where ((•••)) denotes the quenched average over the inputs and dfn{3) = (2-Ke)~ N / 2K Y^Ja ^^^(£1=0^ 
N/K) is the usual integration measure for spherical perceptrons. Then 

g(a c , /U 2 , • • ■ , fix) = lim f(a, /3,/z 2 , • . . , fix) (7) 

p— >oo 

gives the typical minimum of E{3\,... ,3k)- The limit (3 —> oo corresponds to the saturation limit 
a — > a c . The values of the correlations c m defined in cq.(Q) in this saturation limit are from (§, |) 
given by 

■~9{otc, M2, • • • , m) = -Kc[ s) + fj, 2 ( 2 J 4 S) H V [ikc { ^ (8) 



1 dg(a c ,fi 2 , ■ ■ ■ ,Vk) _ (K\ (s ) 



fc 



k = 2,.-.,K (9) 



Inverting these equations we find the saturation values a c and /ii, as functions of c\ , . . . , cr- which is 
what we were looking for. 

The caluculation of g{a c ,^2, ■ ■ ■ ,^k) proceeds along similar lines as for the single perceptron case 
studied in p9[ . Within replica symmetry one has to introduce an order parameter q characterizing the 
typical overlap between two coupling vectors that contribute significantly to the free energy (|^). In the 
limit [3 — > oo it is convenient to replace this order parameter by x = (3(1 — q). If the minimum of the 
costfunction is not degenrated we will find q — > 1 for f3 — > oo with x remaining of order 1 . Qualitatively x 
describes the steepness of the minimum of the costfunction. The smaller x the fewer couplings contribute 
significantly to the free energy for large (3, i.e. the steeper the minimum of the costfunction. Accordingly 
x = oo correspondes to a degenerated minumum since q ^ 1 even for [3 — > oo. 

For all choices of the parameters fi m there is a minimum V m i n = min{ Tfc } V(t\ , . . . ,tk) of V{t\ , . . . , tk) 
and hence aNV m i n is a lower bound for the costfunction E(3\, ■ ■ ■ , 3k)- Now consider the subset of {tj;}- 
configurations that realize V m i n and calculate the correlations c m for this subset. The resulting values for 
the c m are special in two respects. First the value of a c corresponding to them will occur for x — oo since 
the minimum of E is degenerated for a < a c . Second exactly these values of c m will occur in a MLN with 
that Boolean function between hidden layer and output that maps all the {rfc}-configurations that realize 
V m in on the output +1. Consequently MLN with K hidden units and fixed Boolean function between 
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hidden layer and output will show up as "pure cases" defined by x = oo at a c in our analysis and all 
situations with x < oo can be interpreted as these pure cases above saturation. Changing the parameters 
/i m or equivalently the prescribed values of the c rn will hence induce continuous transformations between 
the different possible MLN. 

The main steps of the formal analysis are sketched in appendix A. The final result reads (cf. (p0|)(|3l|)) 



where 



F(x,tk) = min 

At ,Af 



^ 5^(Afc - t k f + y(sgn(A 1 ), . . . , sgn(X K )) 



(10) 



(11) 



and Dt = exp(-i 2 /2)dt/y/^K. 

The minimization in ( pd| ) is non-trivial. The quadratic terms in ( pi] ) are smallest for A$! = tk- 
They compete with the step functions in l/(sgn(Ai), . . . , sgn(Aif )) giving rise to discontinous jumps in F 
whenever one Xk crosses zero. Closer inspection shows that for the global minimum one has 



if t k < 
if t k >0 



\° = t k or \% -- 

The saddlepoint equation which determines x can be written in the form 

a c J \ L V 



(12) 



(13) 



Note that in this equation only those regions in the gaussian integrals contribute for which A? 7^ tk- 



4 General results for prescribed highest and lowest correlation 

Of particular interest is the case in which only the values of c\ and ck are prescribed, i.e. /12 = ^3 = 
...fiK-i = in the costfunction (|J). It describes the interpolation between individual perceptrons 
(/iA' = 0) and the parity machine (/j,k — * ±00) which is known to saturate the asymptotic upper bound 
a c = log if / log 2 for the storage capacity for large K This special case is also sufficient to discuss the 
relation with the most important tree-structured MLN for K = 2 and K = 3. Moreover the necessary 
algebra simplifies somewhat. 

Let us first note that the correlation coefficients c\ and ck are not independent of each other. It 
is hence not possible to prescribe arbitrary values for them. According to their definition (||J3|) we 
have always c±, ck S (—1, +1). Moreover it is sufficient to consider positive values of c\ only which is 
guarantied by the structure of the costfunction (||) . Finally the relation 

c K > Kci — (K —1) (14) 

must hold. It is a consequence of the obvious observation that the difference between c\ and ck is maximal 
if for every pattern at most one perceptron has negative output which corresponds to the equality sign 



in (14). 
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To perform the detailed analysis we denote fj,K simply by /i to get 



E 



E r * 



£7(Ji,. 

Accordingly eq.(pl) simplifies to 
F(x,t]~) = min 

At,-", Aft- 

L fe fc 

In appendix B the following expressions for the correlation coefficients c\ and ck are derived: 



2^ E( Afc ~~ ifc ) 2 ~~ E s § n ( A fc) + /" s § n ( A i A 2 • ■ ■ A A ') 



ci = 1-2 Jf (2^i) 



/ 1 (H,x,Q)-/ 2 (|m|,x,0) 



c K = 2* [1/2 - H(2^)] K - KagaQj.) 



/i(|mUo) + /2(ImUo) 



Moreover the saddlepoint equation fixing x can be transformed into 

-2x J 



-1- = I_ J ff(2V5)-2V5^= + r 
A a c 2 V27T 2 



/i(| M U,l) + / 2 (| M |,x,l) 



(15) 

(16) 

(17) 
(18) 

(19) 



As usual we have used the abbreviation H{x) = Dt. fi(\/i\,x, L) and /^(M, x, L) (with L = 0, 1) are 
integrals over sums of products of error functions explicitly given in appendix B. The final analysis of 
these equations has to be done numerically. 

As discussed in the last section it is of particular interest to find the correlations c\ and ck for which 
x — oo at a c . From eqs.(17)-(19h and (p9|)-(p2]) we find the following results 





M 


c\(x ~ oo) 


ck{x — oo) 


l/a c (x = oo) 


I 


/i < 1 


1 


1 


K/2 

K/2-K J Dt t 2 [Hit)}*- 1 

— OO 


II 


M=l 


1 - 2/K + l/2<- K - r >K 


-l + l^- 1 ) 


III 


fi > 1 


1 - 2/K 


-1 


K/2 - K fDt t 2 ([H{-t)] K - 1 - [Hit)]*- 1 ) 





Table 1: Correlation coefficients and storage capacity for an ensemble of K perceptrons in the pure cases 
characterized by x — oo (see text). 

Note that all three pairs (ci, ck)[( x =oo) ue on the nne given by (|l4|), in fact (I) and (III) are the 
endpoints of this line. 

It is at first sight surprising that the parity machine does not occur in table (|l|). However from the 
structure of the cost-function ([l5]) it is clear that the internal representations of the parity function realize 
V m i n only in the limit fi — ► ±oo. For finite \fi\ the first term in ( [l5| ) suppresses configurations with more 
than one negative output and gives rise to case (I) or (III). 



5 K = 2 

The simplest case to apply the above concepts is provided by two perceptrons with N/2 inputs each 
corresponding to K = 2. The only relevant correlations are c\ and c 2 (see eqs.(||,||)). The relative 
importance of these in the cost-function ( |l5| ) is regulated by fi. 
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Figure 1: Storage capacity a c (ci,C2) for K — 2 correlated perceptrons. 

Left: a c (c2) for c\ — 0, 0.4, 0.5 and 0.6. Outside the shaded areas no solutions exist, dark shade 
correspondes to > 1 light shade to fi < 1. The dashed-dotted line (fj, = 0) gives the location of the 
maxima. The symbols denote the pure cases corresponding to the MLN summarized in table ^. 
Right: a c (ci) for (from bottom to top): c 2 = 1, 0.8, 0.7, and 0.5 (dashed) and c 2 = —0.8, —0.7 and 
—0.5 (full). The lines end at the thin line given by c 2 = 2c% — 1. The symbol correspondes to the parity 
machine. 

In fig.[l](left) the dependence of a c on c 2 for several values of c\ is shown. Solutions exist only inside 
the shaded areas the boundaries of which correspond to c\ = and c 2 = 2c± — l respectively (cf.(p^)). The 
maxima of a c (c 2 ) at constant c\ occur for the uncorrelated system \i — implying c 2 = c\ as expected 
since an additional constraint on c 2 can only reduce a c . The values of a c (ci,cf) at these maxima are 
consistent with the results of Gardner and Derrida for the minimal fraction of errors /gd = (1 — ci)/2 

& . 

Complementary the dependence a c (ci) for fixed c 2 is shown in the right part of figJU Lines for c 2 and 

— c 2 start at the same point for c\ = 0. It correspondes to /i = ±oo where the value of c\ has negligible 
influence in the cost-function ( filf ) . With increasing c\ the value of a c always decreases because additional 
constraints are to be satisfied. These new constraints give rise to C\ > and are hence harder to satisfy 
for negative values of c 2 . Finally all lines end at the thin line given by c 2 = 1c\ — 1. 

The pure cases for K = 2 defined by x = oc at a c are indicated by symbols in fig|l| They correspond 
to twoTayer networks with two hidden units and fixed Boolean functions between hidden layer and output 
and are summarized in table 

In our analysis the AND-machine denotes the situation in which the two perceptrons have to give 
simultaneously the correct output a v = +1 for all patterns. The storage capacity is hence given by the 
Gardner result, i.e. a c — 1 since each perceptron has N/2 couplings only. Note that the AND-machine 
investigated in has random outputs a v = ±1 and therefore the value for a c is different. The XOR 
function defines the K = 2 parity machine for which the replica symmetric a c was first obtained in . 
The result for the OR-machine is new, again it refers to the situation where random inputs have all to be 
mapped on / = +1. Finally let us note that there is another rather trivial pure case given by c\ — c 2 = 
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symbol in fig|l] 


Cl 


C2 


a c (x = oo) 


M 


Boolean function 


triangle 


1 


1 


1 


< 1 


AND 


square 


1/4 


-1/2 


11.01 


= 1 


OR 


circle 





-1 


5.50 


> 1 


XOR 



Table 2: Patterns of correlations for K = 2 perceptrons equivalent to two-layer networks with fixed 
Boolean function between hidden units and output. 



with a c = oo corresponding to the Boolean function that gives output +1 on any input. 

The results obtained for K = 2 are summarized in figj^ showing the region of allowed values in the 
ci-C2-plane together with lines of constant a c and constant (i. The arrows at the lines of constant fi point 
to smaller values of a c . The above discussed hidden unit machines are again marked by the symbols of 
table|[ All other points can be interpreted as these machines above their storage capacity. Note that 
the same point could be associated with different machines beyond saturation since by prescribing the 
correlations appropriately we can induce continuous transitions between different machines. 




Figure 2: Contour map of a c (ci,C2) and £t(ci,C2) for K = 2 correlated perceptrons. 

Full lines correspond to a c = 100,11.01 = ct^ ,5.50 = a* OR (from left to right), dashed lines to 
fi = —10, —2, —1, 0, 0.99, 1.01, 2, and 10 (from bottom to top). Symbols denote the same MLN as in table 

I 



6 K = 3 

A similar analysis can be performed for if = 3. As discussed in section 4 we set fi2 = and denote /j,3 
sim ply by fi. Similar to the last section we can than determine a c (ci,C3) from a numerical analysis of 
eqs.(0Jp). 

Fig.|peft) shows the dependence of the critical storage capacity a c on C3 for fixed values of c\. The 
dependencies are rather similar to the case K = 2 shown in the left part of fig^. Again solutions ci(a c , C3) 
exist only in shaded areas. The maxima of the a c (c3)-curves lie on the dashed-dotted line corresponding 
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to independent perceptions (fi = 0). They are hence characterized by C3 = c\ and are again consistent 
with the Gardner-Derrida results on the minimal fraction of errors for perceptrons above saturation Jig] . 

Complementary the dependence a c {c\) for fixed values of C3 is shown in the right part of Fig.M. 
Again similar to the case K = 2 we find that a c decreases with increasing c±. In particular the lines 
for C3 = ±1 show how the storage capacity decreases from the value of the K = 3-parity machine at 
c\ = if additional constraints showing up in c\ > are included. All lines end at the thin line given by 
C3 = 3ci — 2. 




Figure 3: Storage capacity a(ci,C3) for K = 3 correlated perceptrons. 

Left: a c (c^) for c\ — 0,1/3,5/12 and 3/5. Outside the shaded areas no solutions exist, dark shade 
correspondes to n. > 1 light shade to fj, < 1. The dashed-dotted line (fj, = 0) gives the location of the 
maxima. The symbols denote the pure cases corresponding to the MLN summarized in table ^. 
Right: a c (ci) for (from bottom to top): C3 = 1, 0.9, and 0.5 (dashed) and C3 = —1, —.9 and —0.5 (full). 
The lines end at the thin line given by C3 = 3ci — 2. The symbol correspondes to the machine giving 
overall output +1 only if exactly one hidden unit is —1. 

The symbols in fig.^ refer again to pure cases with x — 00 at a c corresponding to the MLN summarized 
in table |3| . In addition to the and- and parity- machine we have now the committe-machine and a machine 
with the Boolean function for which the output is +1 if exactly one hidden unit is —1. 



symbol in fig.(||) 


Cl 


C3 


a c (x — 00) 




Boolean function 


triangle 


1 


1 


2/3 


u < 1 




AND 




star 


5/12 


-3/4 


4.02 


fi = 1 




COMMITTEE 




diamond 


1/3 


-1 


3.669 


fi > 1 




+ +),(+ - +),(+ 4 


-) 


circle 





±1 


10.37 


/1 = ±00 




PARITY 





Table 3: Patterns of correlations for K = 2 perceptrons equivalent to two-layer networks with fixed 
Boolean function between hidden units and output. 

We can again summarize the results in a contour plot showing lines of constant a c and fi in the 
ci-C3-plane fig.|[ Only combinations of c\ and C3 that belong to the shaded areas are possible, light shade 
correspondes to ji < 1, dark shade to fi > 1. The arrows at the dashed lines of constant fi point again into 
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regions of lower a c , the symbols are those of table |3| Large values of c\ imply a strong correlation of every 
perceptron with the common output and gives therefore small a c and a narrow interval of consistent values 
of C3. Relaxing the constraint on ci allows a more efficient "division of labour" between the perceptrons 
and results in a broader spectrum of C3-values and enhanced storage capacity. Accordingly the largest 
values of a c are possible for ci = 0. Then a c only depends on C3 and starting from the value 10.37 for 
the parity machine at C3 = ±1 it increases without bound with decreasing | C3 1 . 




Figure 4: Contour map of a c (ci,C3) and /j,(ci,Cs) for K = 3 correlated perceptrons. 

Full lines correspond to a c = 100, 10.37 = a^ AR ,A.02 = a^' OM and 2. (from left to right), dashed lines 
to ft — —10, —2, —1,0,0.99, 1.01, 2, and 10 (from bottom to top). Symbols denote the same MLN as in 
table (§). 

A new aspect of the case K = 3 is that there is a correlation coefficient, C2, that was not presribcd 
(since we put /X2 = 0). It is nevertheless of interest to know the value of C2 that correspondes to different 
choices of Ci and C3. The easiest way to obtain C2 is via a maximum entropy argument. This is sketched 
in appendix C. The result is 



c 2 = -\ + \j\ +c i + c i c 3 ( 20 ) 
It is interesting to note that for the values c\ — 5/12 and C3 = —3/4 characteristic for the committee- 



machine this formula gives C2 = — 1/6 which is in fact the correct result 1 14 . The committee function for 
K — 3 does hence not imply constraints on C2 and is already uniquely characterized by the values of c\ 
and C3. 



7 Conclusions 

In the present paper we have considered ensembles of K perceptrons with random inputs and investigated 
the possibility to choose the couplings such that prescribed correlations c m between the outputs of 
the perceptrons occur. For any combinatorically possible combination of c m there is a critical value 
a c (ci, . . . , ck) and solutions for the couplings of the perceptrons exist if the number of inputs is less 
than Na c . These investigation establish a relation between the results for single perceptrons above 
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their storage capacity and those for several MLN with tree-structure and K hidden units and fixed 
Boolean function between hidden layer and output. Similar ideas were persued in [ p"5[ and |Tl[ ] where 
approximate expressions for the storage capacity of a parity machine and committee machine respectively 
were obtained from the results of Gardner and Derrida on the minimal fraction of errors of perceptrons 
beyond saturation and in p2| where analogies between a committee machine and noisy perceptrons were 
investigated. The new aspect in the present paper is that also the influence of higher correlations that 
are known to be important for the storage abilities was taken into account. The results show which 
correlations are difficult to implement and are therefore important for the determination of the storage 
capacity and which are easy and therefore not very restrictive. A detailed analysis was carried out for 
K = 2 and K = 3. 

The technique used is a generalization of the canonical phase space analysis introduced by Gardner 
and Derrida. The results were obtained within the replica symmetric ansatz. They should hence be seen 
as a mere first orientation since it its well known that replica symmetry breaking (RSB) is crucial for 
both the description of perceptrons above saturation and the storage abilities of MLN [|[ ||, [| . An 
investigation of the problem within RSB though highly desirable seems technically rather involved. Also 
the extension of the analysis to the asymptotic behaviour for K — > oo would be very interesting and 
would hopefully shed some light on the still controversial problem of the storage capacity of MLN in this 
limit. 
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8 Appendix A 

In this appendix we outline the calculation of the free energy eq. (^) corresponding to the cost function 
eq.(|^) within replica symmetry. To this end we employ a generalization of the formalism of Griniasty 
and Gutfreund |Q . 

To perform the average over the random patterns we use the replica trick 

1 1 ((Z n \\ — 1 
f(» 2 ,...,» K ,0) = ((lnZ))=-— limAL_^ (21) 

pN pN n->o n 

involving the partitition function Z 

oo aN 

z = [h^j=6(ji-n/k) rn d w^-j^v^) e "" ?yw, ''' ,AS:) (22) 

-to * ^ °° k - 

VM,...,\ V K ) = -^ S gn(A^)+/x 2 ^ Sgn (A^An+--- + /iKSgn(Ar--A^) 
k (fc,i) 

Introducing integral representations for the <5-functions and performing the average over the patterns we 
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find 



OO OO OO 

= / n ^ J n J n 



i^fr(Q fc A fe )+G 2 (^ b ,^) 



TV 

' 1 77 



■aNdiQL.QK. 



where 



G 2 (Ff,E%) = -i^[ n + tr(ln A*)], 



(23) 



(24) 



and 

G\{Qi-Qk) =ln 



/ II / II ?£r -p N E y*^ - 5 E y*o*y? + p E n*?. A * 



Here A^ = (A^, • • • , X k ) and y k = (y k , • • • , y£) and we have used the matrices Q k and 



1 qf \ ( iE% -iFf 



(25) 



(26) 



where as usual q k describes the overlap between two replicas a, b in the coupling space of perceptron fc, 



qf = J a k 3 b k K/N. 

The saddlepoint for Ei and F£ b is given by Q^ 1 = A k resulting in 



« 



OO 

Z n )) ~ / J] dqf exp ^E ln (det Q k )+aNG 1 {Q 1 ..Q K , 

J a<b;k ^ I- 



(27) 



(28) 



To evalute the remaining saddle point integral we use the replica symmetric ansatz qf = q k for all a ^ b. 
Moreover we expect permutation symmetry between the different perceptrons implying q k = q for all 
k = 1, .., K. Then In [det Q) = n(ln(l — q) + q/(l — q)) and for Gi(qi, .., qic) it follows 

Gi(q) = 

OO OO 

« . £. ~ . 7„ V ^ . 7^ ~ 7„ ~ 7,, 



oo 



(/A 



T\Dt k In / TT - aAk exp -E^V 
¥ 7 J-l V27r(l-g) V V 2(1 



(A fc - tky/q) 2 
2(1-9) 



•/3V(A 1 ,...,Aa:) 



(29) 



In order to calculate the function <?(a c , fi2, ■ ■ ■ , ^k) eq.(0) we have to consider the saturation limit 
(3 — > oo. It is convenient then to use the rescaled saddle point variable x = (3(1 — q) instead of q. In this 
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way we obtain 



g(a c ,fi 2 , ■ ■ ■ ,Mc) 



F(x,h,t 2 , ...,t K ) = min 

Xi,...,Xp 



oo 



E 



2x 



+ V(X 1 ,...,X K ) 



(30) 
(31) 



which coincides with ([T(]). The saddlepoint equation eq.(|T3|) determing x follows by explicit differentiation 
of eq.(|30|) with respect to x. 



9 Appendix B 



In this appendix we sketch the main steps of the derivation of the saddle point equation (|13|) and of the 
free energy (]§(]) for the case that only c\ and ck are prescribed. We also give the explicit expressions for 
Ci and Ck as a function of a c and /i. 

The calculation of g as given by (|3^,|l^) requires minimization of 



F(x,t!,t 2 , = min 

Ai , A2 , . . . 



From eq.(12) we have 



r K 

E 

k=l 



sgn(A° fe ) 



(A fc - t k f 
2x 



K 



K 



- e si s n ( A *:) + m n si s n ( A <o 



k=l 



sgn(t fe ) if X° k = t k 
-sgn(t k ) else 



fc=i 



(32) 



(33) 



Eq.(p2|) then becomes 



M 5(A°) 



E 

k=l 



sgn(t fc ) 



E 

V,A°=0± 



f 2 

2z 



2 sgn t. 



(34) 



Here S , (A"°) = nf=i sgn(A^) = (-l) m nfLi sgn(t fc ) were m counts all Ag = 0. The last sum in eq.(|| 
has only contributions from those A with A° ~ 0* . 

To minimize F for given ii, .^Ak we have hence to find which of the 2 K configurations {X\, X K } 



*2 = {o ± 

last term in eq 



A" = {0 , tk}, minimizes eq.(|34|). A suitable procedure to do this is as follows. We first make the 



as small as possible. That is for all tj with tj G (—2-^,0) we choose for a first try 
Xj = ± . We denote the resulting value for S(X°) by S* . (S* = (— l) n where 77 is the number of all 
t k < —2*Jx.) If [i S*(i) < the optimal configuration has already been found because the first summand 
is at its miminum as well. If on the other hand \x S*(i) > there is competition between the first and 
the last term in eq.(^). One may then change the sign of S(X°) in order to lower F(x,ti,t2, ■ ■■,ti() 
by 2\fi\ by either setting a single A° = 0* althought ti (— 2^/x, 0) or setting a single A° = <; for one 
ti € (— 2 v / a;, 0). The corresponding changes in F are 2(w(ti) — \/j,\) where 



w(t) 



t 2 /4x - 
-t 2 /Ax + 
t 2 /4x + 



1 if t € (-oo,-2y/x) 
1 if t e (-zjx,Q) 
1 if t e (0,oo) 



(35) 
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To the saddle point equation ( |l3| ) only regions in the integral contribute for which A° ^ tj for at least 
one j. Formalizing the above consideration we find 



-J- = f ... f Dt 1 ...Dt K lY j t 2 k=1 Q I {t k ) + 



-oo — oo 



A" 



+ ^e(/i s*)e(|/*| - ^(ii))i?(-i) ej(ii) J] e(«;(t*) - w(h)) 



k=2 



1 if t e (-2V»,Q) 
else 



(36) 
(37) 



The first term of eq.(p6[) stems from our first guess minimizing the last term of eq. (|34|) only. The 
various Theta-functions in the term that contributes only for /i S* > implement the different cases 
discussed in context with eq.(|35|). Integration variables can be renamed that always t\ < t% < — < tx 
with no restriction of generality. 

The integrations over t2, tx yields a product of sums of two error functions. Finally the saddlepoint 
equation reads 



1 1 i—e. 
= -- H(2y/x) - 2^- 



-2s- 



1 



K a r 2 



2tt 2 



f 1 (\ f j,\,x,l) + h(Hx,l) 



(38) 



fi{\/i\ < l,x,L) 



-2y/x(l-M) 

-l] L J Dtx t\ L 



K-l 



sgn^ 



H^-H^h) 



K-l 



(39) 



fi(\n\ > l,x,L) = 



2^(1/^1-1) 



K-l 



sgn/i 



H(h) - H p (h) 



K-l 



[-1] L f Dt x t\ L 



H{ti) + H m {ti 



K-l 



sgn^ 



ff(ti)-H m (ti) 



K-l 



(40) 



where we introduced the abbreviation H p (ti) = H(y/8x + H m {t x ) = H(y/\8x — t\\) and H m (t\) = 



H(-y/\8x As usual H(t) = J Dt. Similarly 



f 2 (\n\<l,x,L) = J Dtitf 

-2y/x(l+M) 



,2L 



-i K-l 



H-^ + Hi-ti) 



sgn/x 



K-l 



(41) 
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h(\fj,\ >i,x,l) = I J m x t\ L 



K-l 



-2V2x 
-2s/2x 



Dti t\ L 



H-(h) + H{-h) 



H m (h) + H (-ti)) 



H~(h) - H(-h) 



K-ls 



K-l 



H m {tx) - H{-ti)) 



K-l 



(42) 



A common feature of eq. (|39|-|42|) is that in the binomial expression those terms cancel which correspond 
to regions with [iS* < 0. 

The calculation of g proceedes along similar lines. 



oo oo 



g/a c = j ... j Dt 1 ...Dt K (nS*-J2 s S^(tk) + 2 ^ sgntj 
-oo -oo 1 v*A°=o 

CO OO j£ 

= J ... J Dt!...Dt K (v S* -2j2&i(tk) + 2KQ{ f ,S*)e(\ f ,\-w(t 1 )) 

-oo k=1 

pS* + (-l^'^sgnfa)) Q(w(t k ) - w(h))\ 

/ l 9 J 



— oo -co 



(43) 



We find 



g/a c = -2KE{2y/x) + 2 K ^E K {2^/x) + 
+ iJ/iG^MWaClMUO)-!^ 



f 1 (\n\,x,Q) + f 2 (\filx,0) 



(44) 



Performing the derivative of g /a c with respect to /j. one realizes that there is no contribution from the 
/x-dependence of the integration limits in eqs.(|3S| [42]). Hence the expression (Q) for g/a c is already of 
the form g/a c — —Kc\ + p ck und we arrive at eq. (|17|,|lq) for the correlation coefficients c\ and ck- 



10 Appendix C 

To determine C2 for given values of c\ and C3 we look for the probability distribution P(t\ 1 T2,t^) that 
for the given values of c\ and C3 realizes the maximal entropy. Because of the permutation symmetry 
between the perceptrons we have only to determine the probabilities pk of output configurations with k 
negative outputs where k = 0, . . . ,3. Hence we have to maximize 

S = -p logpo - 3pi logpi - 3p 2 logp 2 - Pz logp 3 (45) 
+A (po + 3pi + 3p 2 + pa - 1) 

+ \l(p +Pl-P2~P3~ Ci) 

+A 3 (po - 3pi + 3p 2 - P3 - C3) 

where the are Lagrange multiplier incorporating the constraints. Performing the derivatives with 
respect to the p k yields 

P0P3 = V1V2 (46) 
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Using the constraints to solve for the pk gives 



c 2 = --±]J -+ci + Cl c 3 
where only the upper sign give rise to positive values for all pk- 
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